Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zamba2 #34517

Draft
wants to merge 56 commits into
base: main
Choose a base branch
from
Draft

Add Zamba2 #34517

wants to merge 56 commits into from

Conversation

pglorio
Copy link
Contributor

@pglorio pglorio commented Oct 30, 2024

What does this PR do?

Please include support for Zamba2 architecture created by Zyphra Technologies.

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

@pglorio pglorio marked this pull request as draft October 30, 2024 17:57
@pglorio
Copy link
Contributor Author

pglorio commented Nov 11, 2024

Hey @Arthur,

Thank you again for your help in getting Zamba2 into transformers! The PR is now finally ready to be reviewed. I added the documentation and all unit tests pass, including slow tests.

A few remarks, mostly related to modular transformers:

  1. To generate modeling and configuration I used utils/modular_model_converter.py from a previous commit because the most recent version of this script that followed from a large refactoring produces an error that I was not able to fix:
Converting src/transformers/models/zamba2/modular_zamba2.py to a single model single file format
Traceback (most recent call last):
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1510, in <module>
    converted_files = convert_modular_file(file_name, args.old_model_name, args.new_model_name)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1447, in convert_modular_file
    for file, module in create_modules(cst_transformers).items():
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1387, in create_modules
    nodes_to_add, file_type, new_imports = get_class_node_and_dependencies(modular_mapper, class_name, node, files)
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1337, in get_class_node_and_dependencies
    new_node_dependencies, new_imports = check_dependencies_and_create_import_node(
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in check_dependencies_and_create_import_node
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
  File "/workspace/transformers_zamba/utils/modular_model_converter.py", line 1283, in <setcomp>
    class_dependencies = {dep for dep in new_dependencies if m.matches(mapper.global_nodes[dep], m.ClassDef())}
KeyError: 'Zamba2Config'

I carefully compared Zamba2Config with classes of other models that also use modular (such as Gemma2Config) and they appear to have consistent format. Relatedly, the utils/modular_model_converter.py in the current PR (path) is the version from the previous commit mentioned above.

  1. After running utils/modular_model_converter.py, the modeling and configuration files generated contain unintended code that I had to update. All these modifications are in this commit. In particular, the produced modeling file contains Zamba2DynamicCache, which is the correct cache of Zamba2 as well as HybridMambaAttentionDynamicCache, which is the cache of Zamba and is not relevant to Zamba2, so I deleted HybridMambaAttentionDynamicCache and related references.

  2. I ran make fixup and all zamba-related tests pass, with the exception of python utils/check_modular_conversion.py. This test doesn't pass due to the modifications mentioned in the previous point.

  3. I slightly edited the Zamba2MambaMixer compared to the original Mamba2Mixer of mamba2, the main difference is that I added these lines, which was necessary to appropriately process the mamba2 cache (note this step already existed in the torch forward in these lines).

Looking forward to your feedback. Thanks so much!

@Cyrilvallez Cyrilvallez self-requested a review November 28, 2024 16:08
@Cyrilvallez
Copy link
Member

Cyrilvallez commented Dec 3, 2024

Hey @pglorio, so sorry for the delay!! I think the first error you encounter comes from a wrong ruff version. Our CI uses ruff==0.5.1, so could you make sure to install the same and run it again?

EDIT: but actually, the only failing tests I see are Zamba2 tests

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Here is a first round of reviews, you can check out the comments! For the lora layers, maybe rename them residual instead of lora to make it explicit that it is part of the original model?

You should also make sure that all the tests you added pass, this is not the case actually!

src/transformers/models/zamba2/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
@pglorio
Copy link
Contributor Author

pglorio commented Dec 5, 2024

Thank you @Cyrilvallez for the super helpful feedback! Please see above the replies to each comment.

It seems zamba2-related tests all pass locally for us, but not in the circle/ci setting of the PR. The error is always

   1 failed because `IndexError` -> Dimension out of range (expected to be in range of [-2, 1], but got 2)
   9 failed because `RuntimeError` -> Sizes of tensors must match except in dimension 2. Expected size 8 but got size 1 for tensor number 1 in the list.

for all tests that fail. We investigated the discrepancy by examinating more closely the test tests/models/zamba2/test_modeling_zamba2.py::Zamba2ModelTest::test_beam_sample_generate. We tried using pytest-flakefinder as suggested here and using this command python3 -m pytest -m 'generate' -n 8 -vvv --dist=loadfile --flake-finder --flake-runs=1000 --max-worker-restart=0 tests/models/zamba2/test_modeling_zamba2.py::Zamba2ModelTest::test_beam_sample_generate but the test still passes. This is the closest reproduction of the circle/ci test that we could reproduce to the best of our knowledge. We'd appreciate if we can get a hint of what we can do to fix this discrepancy!

Also, we are happy to rename lora layers as you suggested here and agree it would clarify the role of this module. Would pretrained_lora be also good as an option (residual might for some people refer to the identity stream)?

Thank you so much again!

@pglorio
Copy link
Contributor Author

pglorio commented Dec 7, 2024

We renamed lora -> adapter here and, following your suggestion in this comment we simplified the adapter using nn.Sequential in this commit.

We re-ran the model tests and they all pass locally, but it looks like they still fail in this PR environment.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I pointed out some minor issues, but this is almost ready! 🤗 However, you might want to consider switching the attention implementation to our newest standard here #35235 (see e.g. the llama model as an example). This is not strictly needed, but should be quite straightforward from the examples, and will help both lisibility and maintenance in the long run. And as this change was introduced to make transformers compatible with high-performance inference frameworks such as vLLM and TGI, it would potentially make Zamba2 compatible as well (not sure how we will support stateful models, and if the shapes in the Mixers are compatible, but cannot hurt!) in the future!

src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
src/transformers/models/zamba2/modular_zamba2.py Outdated Show resolved Hide resolved
@Cyrilvallez
Copy link
Member

I cannot reproduce your issue with tests though, they are failing locally for me. Running CUDA_VISIBLE_DEVICES=2 pytest tests/models/zamba2/test_modeling_zamba2.py gives:
Screenshot 2024-12-18 at 17 47 59

@huggingface huggingface deleted a comment from github-actions bot Dec 18, 2024
@pglorio
Copy link
Contributor Author

pglorio commented Dec 20, 2024

Thank you so much for this feedback @Cyrilvallez. I realized that the issue with unit tests was inside the torch_forward method of the mamba2 mixer (when i ran locally, the unit tests used cuda_kernels method instead). I fixed that method here: 1 2 3.

By the way, we originally took the the torch_forward from the mamba2 model, so the same issues hold there. In particular, running this:

config = Mamba2Config(num_heads=8,
        n_groups=8,
        state_size=2,
        head_dim=8,
        conv_kernel=4,
        chunk_size=8,
        vocab_size=99,
        hidden_size=32,
        num_hidden_layers=4,
        hidden_act="silu",
        hidden_dropout_prob=0.1,
        max_position_embeddings=512,
                      )
model = Mamba2ForCausalLM(config)

inputs = {'input_ids': torch.tensor([[86,  6, 51,  3, 12, 15, 33, 18,  4, 92],
         [69, 66, 49, 45, 48, 44, 61, 56, 68, 85]]),
 'attention_mask': torch.tensor([[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
         [0, 1, 1, 0, 1, 1, 1, 0, 1, 1]])}

outputs_cpu = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
model = model.to('cuda')
inputs = {key: tensor.to(device=model.device) for key, tensor in inputs.items()}
outputs_cuda = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=False, output_scores=False, use_cache=True, num_beams=1, do_sample=False)
print(torch.all(outputs_cpu == outputs_cuda.cpu()).item())

returns False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants